pro-iBiosphere - Important principles of identification and web integration: Identifier and Resolution

Home » News

22.04.2014

Important principles of identification and web integration: Identifier and Resolution

pro-iBiosphere Share:

by Kevin Richards, email: [email protected]

The topic of "stable unique identifiers" in the biodiversity informatics community has had quite a varied history in recent years. With the fast changing world of technology, information and the latest approaches to deal with information storage and access, several changes in direction have taken place.

In these changing times it seems that trying to stick to basic technologies, especially those that work with standard internet protocols, is the way to go. However, it is important to emphasise the two major components of identifiers: the IDENTIFIER and the RESOLUTION. These important principles of identification and web integration were put to use at the recent Biodiversity Enrichment Hackathon that took place on 17-21 March 2014 in Leiden. The importance of identification and resolution is obvious when attempting to link various data sets and information sources in the Biodiversity domain.

IDENTIFIER for the data

The first issue for any user of data is the need to identify that particular piece of data. This has traditionally been done using fairly local identifiers such as a number counter (i.e. 1,2,3...). With the need to integrate and access data globally, other mechanisms have been required. The simplest approach to this is called Universally Unique Identifier (UUID). UUIDs are hard to read and quite unappealing to look at, for example "1696AC49-548F-404D-9DEA-8A1C4DDA37F4" but are still a good mechanism for identifying data in a computer system, and hence, work well for computer needs.

RESOLUTION of data by their identifiers

With the increasing demand to have data accessible and linked on the web other identifier mechanisms are required to allow data to be fetched via their identifiers. Within the biodiversity community several approaches have been taken. Originally LSID (Life Science IDentifiers) were promoted as they had several appealing features, namely, a degree of indirection from the domain name associated with the data host and a defined protocol for accessing the data and metadata for a particular object. Other identifier systems were also considered such as DOI, PURL and Handles. The main benefit of all these identifier systems is that the data is then accessible over the web using web technologies.

Then came along the semantic web with some really cool ideas about linking data together in a meaningful way and building a reusable, re-purposeable giant set of data. This has become really appealing to biodiversity informaticians and has consequently resulted in some interesting hurdles to jump to achieve these attractive ambitions. Firstly semantic web technologies highly depend on automation and basic web protocols for harvesting and linking data. So any identifier system that doesn't work well with basic HTTP web protocols is difficult to integrate. This meant that LSIDs have become unfavourable due to their reasonably complex resolution protocol. Instead basic stable permanent URLs have been promoted.

A good approach to using these type of identifiers is to first pick a very agnostic domain name, ie not an institution or university name, but perhaps a "project" name. A good example of this is the International Plant Names Index project – also known as IPNI (its data system is hosted by the Royal Botanic Gardens Kew, London). Then a locally unique identifier portion is attached to the chosen domain name. An example of this combination is Zoobank with their zoobank.org domain name and an identifier for a particular piece of data they host, eg http://zoobank.org/NomenclaturalActs/8BDC0735-FEA4-4298-83FA-D04F67C3FBEC is a resolvable identifier for the zoobank record for the taxon "Chromis abyssus".

The pro-iBiosphere project has created a Best Practices page for stable URIs that outlines some good approaches to creating identifiers for your data with consideration of semantic web requirements and the latest ideas on identification.

Print this article